New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WISH: Check with _R_CHECK_SUGGESTS_ONLY_=true
to identify missing package dependencies
#248
Comments
Thanks @HenrikBengtsson. The fact that this feature is implemented via symlinks makes a lot of sense. I wonder if this works on Windows where symlinks used to be problematic. Even though I noticed that with recent versions of Windows the situation seemed to have improved significantly. Maybe R core decided to make this feature optional because of portability concerns? Anyways I think we should try to add this to the 3.17 builds @jwokaty. This will introduce a lot of new failures so we should announce on the bioc-devel mailing list! Also we'll need to keep a close eye on what happens on Windows and Mac. Thanks! |
It looks like the win-builder service detects
It also works on MS Windows on MS Windows running on GitHub Actions, e.g. https://github.com/HenrikBengtsson/teeny/actions/runs/3595702927/jobs/6055514686 You could start by enabling this on Linux first, then macOS, and lastly MS Windows. |
This looks promising, thanks. The way we deploy Renviron.bioc on the build machines is not platform-specific at the moment so Anyways, we'll cross that bridge when we get there. For now we're going to add this to the devel builds only so not a big deal. Isn't that what devel is about? 😉 We can always reverse if things go wrong. |
I can't reproduce the tradeSeq example.
I ran
and got ...
Session Info:
|
Could it be that you've installed non-base R packages to the system R library? Which package does: $ Rscript --vanilla -e "dir(rev(.libPaths())[1])" list? |
Indeed I have. Everything goes into the system library on my laptop. So the sandbox has no fence? |
The way I had envisioned the sandbox, one would put in the sandbox links to the "base" packages and the packages in the fully traversed dependency graph of all declared dependencies of the package being tested. The check would occur against this sandbox library. But if "base" means "all packages in system library" this will not achieve the aim. |
If the package fails to declare anything needed outside of
it should fail. |
The sandbox runs with the system R library + a temporary R library holding all the declared package dependencies. The only way to support your use case is for R to know which the base-R packages are and figure out to hide anything else. The only way I can imagine that to work is for R to not include the system R library in these tests, but only the temporary R library. I'm not sure if anyone attempted to do that, but I think it'll be hard, because many of the base-R packages are already loaded when FWIW, for this and other reasons, I argue that R should prevent installing anything into the system R library after R has been installed. Nothing in the R installation folder should be modified after R has been installation. The CRAN-provided R installation for macOS has this problem, where it sets write permissions on the R system library folder, causing I argue that not even the "recommended" packages (e.g. codetools and Matrix) should be installed there; they could go into |
There are more base-R packages than those; $ Rscript --vanilla -e "dir(rev(.libPaths())[1])"
[1] "base" "compiler" "datasets" "graphics" "grDevices"
[6] "grid" "methods" "parallel" "splines" "stats"
[11] "stats4" "tcltk" "tools" "translations" "utils" Those packages come with all R installations and are never updated. They're only updated when R is updated. That's a property unique to base-R packages. |
I see your point and I agree that there is value to the typical separation of "system" and "user" packages. When I run your command above I get 980 packages ... and there should be more.... It would be nice to get a clear convention on this. |
I think that's a discussion for the R-devel list, but modulo some outliers, the de-facto standard is to have only base-R and "recommended" packages in the system R library. That's what most R installations install out of the box. |
BTW, as soon as I've installed R from source on Linux as myself, say R v4.2.2, I run |
By setting the mode of the |
... and move all packages except: > tools:::.get_standard_package_names()
$base
[1] "base" "tools" "utils" "grDevices" "graphics" "stats"
[7] "datasets" "methods" "grid" "splines" "stats4" "tcltk"
[13] "compiler" "parallel"
$recommended
[1] "MASS" "lattice" "Matrix" "nlme" "survival"
[6] "boot" "cluster" "codetools" "foreign" "KernSmooth"
[11] "rpart" "class" "nnet" "spatial" "mgcv" to your |
Oh, I think UPDATE: translations is not actually a real package, although it has a DESCRIPTION file and |
Wait.. what? That's really unfortunate. So this means that |
Most
Note, There is also Section '1.1.3.1 Suggested packages' of 'Writing R Extensions' (https://cran.r-project.org/doc/manuals/R-exts.html) says:
To me, it comes down to whether or not you want to find these missing package declarations or not(*). If so, the current design of R requires that you keep (*) If you choose not, you'll have packages that won't be fully installed in air-gapped compute environments. Also, those packages cannot be fully tested in reverse-dependency checks, which means they won't benefit from the revdep checks and vice versa. CRAN packages don't have these problems, since CRAN has had these types of checks in place for years. (I struggle and spend lots of extra time with the revdep failures every time there are Bioconductor packages involved - unless I choose not to care about Bioconductor packages.) |
BTW, if you install everything to the system R library, how do you keep separate Bioconductor release and devel libraries during half of the year when they both share the same R version/installation? |
This last question is really something each user/developer has to address separately, unfortunately. In my case, I wind up with variously named versions of R devoted to one or another bioc stream. As for the build system, since release and devel are on different hardware the question doesn't apply, I think. I'd like to think that the task of identifying undeclared dependencies could be solved without addressing this concern. The fact that the behavior of R CMD check varies from installation to installation is driven by the fact that R itself has installation-dependent behaviors and uses. The dependency analysis doesn't have to come out of the R CMD check process, though it would be nice if it did. Another approach may be needed. |
Note that these type of missing package dependencies can only be detected at run-time. Static-code analysis will not be able to detect those, unless it fully emulates the R engine. Thus, you need a way that only tests with the base-R packages without any additional packages installed beyond the initial R installation. As a refresher, $ R_LIBS="$(mktemp -d)" R_LIBS_SITE=NULL R_LIBS_USER=NULL Rscript --vanilla -e ".libPaths()"
[1] "/tmp/hb/tmp.nQBwsilHpr"
[2] "/path/to/R-4.2.2/lib/R/library" Importantly, it is not possible to remove or replace $ R_LIBS=NULL R_LIBS_SITE=NULL R_LIBS_USER=NULL Rscript --vanilla -e ".libPaths()"
[1] "/path/to/R-4.2.2/lib/R/library"
$ R_LIBS=NULL R_LIBS_SITE=NULL R_LIBS_USER=NULL Rscript --vanilla -e ".Library"
[1] "/path/to/R-4.2.2/lib/R/library" So, installing other packages to $ R --quiet --vanilla
> chooseCRANmirror(ind = 1)
> install.packages("listenv")
Warning in install.packages("listenv") :
'lib = "/path/to/R-4.2.2/lib/R/library"' is not writable
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library
‘~/R/x86_64-pc-linux-gnu-library/4.2’
to install packages into? (yes/No/cancel) yes
trying URL 'https://cloud.r-project.org/src/contrib/listenv_0.8.0.tar.gz'
...
* DONE (listenv)
> find.package("listenv")
[1] "/home/hb/R/x86_64-pc-linux-gnu-library/4.2/listenv" After this, all R packages will be installed to the above
I agree this is unfortunate, and it's even more unfortunately that Simon's macOS CRAN binary makes That said, all the tools are there to detect missing dependencies. I don't see a good reason for Bioconductor servers having to install other packages to ## Make .Library read-only
chmod -R ugo-w "$(Rscript --vanilla -e "cat(.Library)")"
## Install other packages in a library that is specific
## to the current platform, version of R, and version of Bioconductor
BIOC_VERSION=3.17
R_LIBS_USER=~/R/%p-library/%v-bioc_${BIOC_VERSION}
mkdir -p "${R_LIBS_USER}"
export R_LIBS_USER That also have the benefit that you can use the same R installation for different Bioconductor versions. All R packages will from now on be installed under: Rscript --vanilla -e ".libPaths()[1]"
[1] "x86_64-pc-linux-gnu-library/4.2-bioc_3.17" Since |
I confirm. Sure, we could change that. However that's the default setup when you install R from source as a regular user on Linux (i.e. if you skip the So even if we decided to change the setup on the build machines, users/developpers won't have it so won't be able to reproduce what they see on the build report. I really like the idea of identifying undeclared dependency, but not at any cost. With the
So after some internal discussion, we're kind of hesitating to do this. We'll need a little bit more time to think about it. Thanks for your input on this. It's appreciated and we've learned a lot. |
Why would you do that? It's a step that only takes 10-20 seconds to complete. Aren't you worried that you're running with non-intended things in the R installation?
The way I see it, the current setup is making life easier for package developers (because they don't have to fix these bugs because they are not reported in the first place). The cost for this is an increased burden on end-users who have to deal with incomplete packages, because of incomplete testing.
Hmm, did you see my four line code snippet how to do this?
Huh, with a clean I'm honestly surprised I have to argue this much for higher-coverage testing that is already available and implemented. Just to be clear, all Bioconductor package with |
We do out-of-tree compilation so we end up with a clean installation, as documented here: https://github.com/Bioconductor/BBS/blob/master/Doc/Prepare-Ubuntu-20.04-HOWTO.md#configure-and-compile-r This is a very convenient way to manage different R installations under a non-sudoer account on a Linux machine. No need to Anyways, it doesn't matter so much what we do on the build machines. As I said, we could change that. The issue is that this is a very common way to install R among developers, and that produces a setup that is not compatible with We also install Simon's binaries on Mac, like 99% of Mac users, and that also produces a setup that is not compatible with
Higher-coverage testing is good. But in the case of |
With
Ok, sorry, I interpreted it as this was a done deal. |
We've made some progress on this and will announce something soon. Developers might want to come here to provide some feedback so I'm re-opening the issue. |
Thank you sooo much for doing this! I'm confident that this will increase the quality of several Bioconductor packages. |
(This replaces invalid feature requests #242, #243, and #244.)
Background
Using
_R_CHECK_SUGGESTS_ONLY_=true
will runR CMD check
with sandboxed an R library, where on declared package dependencies inDESCRIPTION
+ recursive hard dependencies are included. If an example, package test, or a vignette needs a package that is not in this set of dependencies, it will not be available to be loaded and result in an error.Wish
Add
_R_CHECK_SUGGESTS_ONLY_=true
toRenviron.bioc
to detect missing package dependencies.R CMD check
sets up a sandboxed R library for it's life span. It uses symbolic file links to do this, so the performance overhead for doing this is really small. I could not detect a performance penalty (see below for details).Examples
I ran the following examples using
Renviron.bioc
for Bioconductor 3.17:I also ran with:
export _R_CHECK_TESTS_NLINES_=17
to make it clear what one of the ERRORs is about.
Example 1: Package bnem (missing RUnit for tests and BiocStyle for vignettes)
Downloading bnem package source tarball:
Current Bioconductor 3.17 checks
When checking with:
package passes with a single NOTE.
With also
_R_CHECK_SUGGESTS_ONLY_=true
When checking with:
the package fails with two ERRORs:
and
Status: 2 ERRORs, 1 NOTE
See
‘/tmp/bioc2/bnem.Rcheck/00check.log’
for details.
The problem is that bnem is missing:
Example 2: Package tradeSeq (missing DelayedMatrixStats for examples and tests)
Downloading tradeSeq package source tarball:
Current Bioconductor 3.17 checks
When checking with:
package passes with three NOTEs.
With also
_R_CHECK_SUGGESTS_ONLY_=true
When checking with:
the package fails with two ERRORs:
and
checking for unstated dependencies in ‘tests’ ... OK
checking tests ...
Running ‘testthat.R’
ERROR
Running the tests in ‘tests/testthat.R’ failed.
Last 17 lines of output:
══ Failed tests ════════════════════════════════════════════════════════════════
── Error ('testConditions.R:23'): (code run outside of
test_that()
) ──────────<packageNotFoundError/error/condition>
Error in
loadNamespace(x)
: there is no package called 'DelayedMatrixStats'Backtrace:
▆
[ FAIL 1 | WARN 1 | SKIP 0 | PASS 136 ]
Error: Test failures
Execution halted
checking for unstated dependencies in vignettes ... OK
checking package vignettes in ‘inst/doc’ ... OK
checking running R code from vignettes ...
‘Monocle.Rmd’ using ‘UTF-8’... OK
‘fitGAM.Rmd’ using ‘UTF-8’... OK
‘multipleConditions.Rmd’ using ‘UTF-8’... OK
‘tradeSeq.Rmd’ using ‘UTF-8’... OK
NONE
checking re-building of vignette outputs ... OK
checking PDF version of manual ... OK
DONE
Status: 2 ERRORs, 3 NOTEs
See
‘/tmp/bioc2/tradeSeq.Rcheck/00check.log’
for details.
The problem is that tradeSeq is missing:
Performance overhead
Running
R CMD check
with and without_R_CHECK_SUGGESTS_ONLY_=true
on progressr, which is an all OK package with lots of Suggests:ed dependencies that are also "heavy", confirms that there should be little, near-zero, overhead added.With:
Without:
The text was updated successfully, but these errors were encountered: